Single-image super-resolution (SISR) networks trained with perceptual and adversarial losses provide high-contrast outputs compared to those of networks trained with distortion-oriented losses, such as L1 or L2. However, it has been shown that using a single perceptual loss is insufficient for accurately restoring locally varying diverse shapes in images, often generating undesirable artifacts or unnatural details. For this reason, combinations of various losses, such as perceptual, adversarial, and distortion losses, have been attempted, yet it remains challenging to find optimal combinations. Hence, in this paper, we propose a new SISR framework that applies optimal objectives for each region to generate plausible results in overall areas of high-resolution outputs. Specifically, the framework comprises two models: a predictive model that infers an optimal objective map for a given low-resolution (LR) input and a generative model that applies a target objective map to produce the corresponding SR output. The generative model is trained over our proposed objective trajectory representing a set of essential objectives, which enables the single network to learn various SR results corresponding to combined losses on the trajectory. The predictive model is trained using pairs of LR images and corresponding optimal objective maps searched from the objective trajectory. Experimental results on five benchmarks show that the proposed method outperforms state-of-the-art perception-driven SR methods in LPIPS, DISTS, PSNR, and SSIM metrics. The visual results also demonstrate the superiority of our method in perception-oriented reconstruction. The code and models are available at https://github.com/seungho-snu/SROOE.
translated by 谷歌翻译
最近的研究通过卷积神经网络(CNNS)显着提高了单图像超分辨率(SR)的性能。虽然可以有许多用于给定输入的高分辨率(HR)解决方案,但大多数现有的基于CNN的方法在推理期间不会探索替代解决方案。获得替代SR结果的典型方法是培训具有不同丢失权重的多个SR模型,并利用这些模型的组合。我们通过利用多任务学习,我们提出了一种更有效的方法来培训单个可调SR模型的单一可调SR模型。具体地,我们在训练期间优化具有条件目标的SR模型,其中目标是不同特征级别的多个感知损失的加权之和。权重根据给定条件而变化,并且该组重量被定义为样式控制器。此外,我们提出了一种适用于该训练方案的架构,该架构是配备有空间特征变换层的残留残余密集块。在推理阶段,我们培训的模型可以在样式控制地图上生成局部不同的输出。广泛的实验表明,所提出的SR模型在没有伪影的情况下产生各种所需的重建,并对最先进的SR方法产生相当的定量性能。
translated by 谷歌翻译
When designing a new API for a large project, developers need to make smart design choices so that their code base can grow sustainably. To ensure that new API components are well designed, developers can learn from existing API components. However, the lack of standardized method for comparing API designs makes this learning process time-consuming and difficult. To address this gap we developed the API-Spector, to the best of our knowledge one of the first API-to-API specification recommendation engines. API-Spector retrieves relevant specification components written in OpenAPI (a widely adopted language used to describe web APIs). API-Spector presents several significant contributions, including: (1) novel methods of processing and extracting key information from OpenAPI specifications, (2) innovative feature extraction techniques that are optimized for the highly technical API specification domain, and (3) a novel log-linear probabilistic model that combines multiple signals to retrieve relevant and high quality OpenAPI specification components given a query specification. We evaluate API-Spector in both quantitative and qualitative tasks and achieve an overall of 91.7% recall@1 and 56.2% F1, which surpasses baseline performance by 15.4% in recall@1 and 3.2% in F1. Overall, API-Spector will allow developers to retrieve relevant OpenAPI specification components from a public or internal database in the early stages of the API development cycle, so that they can learn from existing established examples and potentially identify redundancies in their work. It provides the guidance developers need to accelerate development process and contribute thoughtfully designed APIs that promote code maintainability and quality.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
瀑布是全世界老年人死亡的主要原因之一。有效检测跌倒可以减少并发症和伤害的风险。可以使用可穿戴设备或环境传感器进行秋季检测;这些方法可能会在用户合规性问题或错误警报方面困难。摄像机提供了一种被动的选择;但是,定期的RGB摄像机受到改变的照明条件和隐私问题的影响。从机器学习的角度来看,由于跌倒的稀有性和可变性,开发有效的跌落检测系统是具有挑战性的。许多现有的秋季检测数据集缺乏重要的现实考虑因素,例如不同的照明,日常生活的连续活动(ADL)和相机放置。缺乏这些考虑使得很难开发可以在现实世界中有效运行的预测模型。为了解决这些局限性,我们引入了一个新型的多模式数据集(MUVIM),其中包含四种视觉方式:红外,深度,RGB和热摄像机。这些模式提供了诸如混淆的面部特征和在弱光条件下的性能改善的好处。我们将秋季检测作为异常检测问题提出,其中仅在ADL上对定制的时空卷积自动编码器进行了训练,因此跌落会增加重建误差。我们的结果表明,红外摄像机提供了最高水平的性能(AUC ROC = 0.94),其次是热摄像机(AUC ROC = 0.87),深度(AUC ROC = 0.86)和RGB(AUC ROC = 0.83)。这项研究提供了一个独特的机会,可以分析摄像头模式在检测家庭环境中跌落的效用,同时平衡性能,被动性和隐私。
translated by 谷歌翻译
最近的趋势表明,一般的模型,例如BERT,GPT-3,剪辑,在规模上广泛的数据训练,已经显示出具有单一学习架构的各种功能。在这项工作中,我们通过在大尺度上培训通用用户编码器来探讨通用用户表示学习的可能性。我们展示了扩展法在用户建模区域中持有,其中训练错误将作为幂律规模的幂级,具有计算量。我们的对比学习用户编码器(CLUE),优​​化任务 - 不可知目标,并且所产生的用户嵌入式延伸我们对各种下游任务中的可能做些什么。 Clue还向其他域和系统展示了巨大的可转移性,因为在线实验上的性能显示在线点击率(CTR)的显着改进。此外,我们还调查了如何根据扩展因子,即模型容量,序列长度和批量尺寸来改变性能如何变化。最后,我们讨论了线索的更广泛影响。
translated by 谷歌翻译
We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources in a federated setting. We introduce an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. The data sources may have different distributions; the causal effects are independently and systematically incorporated. The proposed method estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. The heterogeneous causal effects can be estimated with no sharing of the raw training data among the sources, thus minimizing the risk of privacy leak. We also provide minimax lower bounds to assess the quality of the parameters learned from the disparate sources. The proposed method is empirically shown to outperform the baselines on decentralized data sources with dissimilar distributions.
translated by 谷歌翻译
Learning to predict masked tokens in a sequence has been shown to be a powerful pretraining objective for large-scale language models. After training, such masked language models can provide distributions of tokens conditioned on bidirectional context. In this short draft, we show that such bidirectional conditionals often demonstrate considerable inconsistencies, i.e., they can not be derived from a coherent joint distribution when considered together. We empirically quantify such inconsistencies in the simple scenario of bigrams for two common styles of masked language models: T5-style and BERT-style. For example, we show that T5 models often confuse its own preference regarding two similar bigrams. Such inconsistencies may represent a theoretical pitfall for the research work on sampling sequences based on the bidirectional conditionals learned by BERT-style MLMs. This phenomenon also means that T5-style MLMs capable of infilling will generate discrepant results depending on how much masking is given, which may represent a particular trust issue.
translated by 谷歌翻译
Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance prediction, drawn from research in computational semantics, to distinguish at the clausal level what is asserted, denied, or only ambivalently suggested by the author or other mentioned entities (belief holders). We first develop a simple RoBERTa-based model for multi-source stance predictions that outperforms more complex state-of-the-art modeling. Then we demonstrate its novel application to political science by conducting a large-scale analysis of the Mass Market Manifestos corpus of U.S. political opinion books, where we characterize trends in cited belief holders -- respected allies and opposed bogeymen -- across U.S. political ideologies.
translated by 谷歌翻译
While inferring common actor states (such as position or velocity) is an important and well-explored task of the perception system aboard a self-driving vehicle (SDV), it may not always provide sufficient information to the SDV. This is especially true in the case of active emergency vehicles (EVs), where light-based signals also need to be captured to provide a full context. We consider this problem and propose a sequential methodology for the detection of active EVs, using an off-the-shelf CNN model operating at a frame level and a downstream smoother that accounts for the temporal aspect of flashing EV lights. We also explore model improvements through data augmentation and training with additional hard samples.
translated by 谷歌翻译